智能论文笔记

Application of Deep Q Learning with Simulation Results for Elevator Optimization

Zheng Cao , Raymond Guo , Caesar M. Tuguinay , Mark Pock , Jiayi Gao , Ziyu Wang

分类：机器学习 | 人工智能

2022-09-30

This paper presents a methodology for combining programming and mathematics to optimize elevator wait times. Based on simulated user data generated according to the canonical three-peak model of elevator traffic, we first develop a naive model from an intuitive understanding of the logic behind elevators. We take into consideration a general array of features including capacity, acceleration, and maximum wait time thresholds to adequately model realistic circumstances. Using the same evaluation framework, we proceed to develop a Deep Q Learning model in an attempt to match the hard-coded naive approach for elevator control. Throughout the majority of the paper, we work under a Markov Decision Process (MDP) schema, but later explore how the assumption fails to characterize the highly stochastic overall Elevator Group Control System (EGCS).

translated by 谷歌翻译

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Ziyu Guo , Renrui Zhang , Longtian Qiu , Xianzheng Ma , Xupeng Miao , Xuming He , Bin Cui

分类：计算机视觉 | 人工智能

2022-09-28

对比性语言图像预训练（剪辑）已被证明可以学习具有出色传递性的视觉表示，从而实现了零击分类的有希望的准确性。为了进一步提高其下游性能，现有作品在剪辑上提出了其他可学习的模块，并通过几次训练集对其进行微调。但是，由此产生的额外培训成本和数据要求严重阻碍了模型部署和知识转移的效率。在本文中，我们引入了一种自由午餐的增强方法CALIP，以通过无参数注意模块来提高Clip的零拍摄性能。具体而言，我们指导视觉和文本表示相互交互，并通过注意探索跨模式的信息特征。由于预训练大大降低了两种方式之间的嵌入距离，因此我们在注意力中丢弃所有可学习的参数，并在双向更新多模式特征，从而使整个过程无参数且无培训。通过这种方式，图像与文本感知信号混合在一起，文本表示形式被视觉引导以获得更好的自适应零射击对齐。我们在14个数据集的各种基准上评估CALIP，用于2D图像和3D Point Cloud几乎没有分类，显示出一致的零弹性性能改进了夹子。基于此，我们进一步在Calip的注意模块中插入了少量线性层，并在少量射击设置下验证我们的鲁棒性，与现有方法相比，这也可以实现领先的性能。这些广泛的实验证明了我们的方法在有效增强夹子方面的优势。

translated by 谷歌翻译

Can Language Understand Depth?

Renrui Zhang , Ziyao Zeng , Ziyu Guo

分类：计算机视觉 | 自然语言处理

2022-07-03

除了图像分类外，对比性语言图像预训练（剪辑）还为广泛的视觉任务（包括对象级别和3D空间理解）取得了非凡的成功。但是，将从剪辑中学到的语义知识转移到更复杂的量化目标任务，例如使用几何信息的深度估计。在本文中，我们建议将剪辑应用于零拍的单眼估计，称为Depthclip。我们发现，输入图像的斑块可以响应一定的语义距离令牌，然后将其投影到量化的深度箱中进行粗估算。在没有任何培训的情况下，我们的深度算法超过了现有的无监督方法，甚至可以接近早期全面监督的网络。据我们最大的知识，我们是第一个从语义语言知识进行零拍调整的人，以量化下游任务并执行零拍的单眼估计。我们希望我们的工作能够阐明未来的研究。该代码可在https://github.com/adonis-galaxy/depthclip上找到。

translated by 谷歌翻译

PointCLIP: Point Cloud Understanding by CLIP

Renrui Zhang , Ziyu Guo , Wei Zhang , Kunchang Li , Xupeng Miao , Bin Cui , Yu Qiao , Peng Gao , Hongsheng Li

分类：计算机视觉 | 人工智能 | 机器人

2021-12-04

最近，通过对比视觉 - 语言预训练（CLIP）的零射击和少量学习已经在2D视觉识别上显示了鼓舞人心的性能，从而了解在开放词汇设置中将图像与其相应的文本匹配。然而，它仍然在探索中，是否通过2D中的大规模图像文本对预先训练的剪辑可以推广到3D识别。在本文中，我们通过提出引人点来识别这种设置是可行的，这在剪辑编码点云和3D类别文本之间进行对准。具体地，我们通过将点云投射到多视图深度映射而不呈现，并聚合视图零拍摄预测以实现从2D到3D的知识转移。首先，我们设计了一个视图间适配器，以更好地提取全局特征，并自适应地融合从3D到2D预培训的剪辑中学到的几次拍摄知识。只需在几次拍摄设置中微调轻量级适配器，可以在很大程度上提高要素的性能。此外，我们遵守CONTCLIP和古典3D监督网络之间的互补财产。通过简单的合奏，PointClip提高了基线的性能，甚至超越了最先进的模型。因此，PointClip是在低资源成本和数据制度下通过剪辑的有效3D点云理解的有希望的替代方案。我们在广泛采用的ModelNet10，ModelNet40和挑战ScanObjectnn上进行彻底的实验，以证明Pointclip的有效性。代码在https://github.com/zrrskywalker/pointclip发布。

translated by 谷歌翻译

DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion

Renrui Zhang , Ziyao Zeng , Ziyu Guo , Xinben Gao , Kexue Fu , Jianbo Shi

分类：计算机视觉 | 人工智能 | 机器学习

2021-11-19

由于其稀疏性和不规则性，点云处理是一个具有挑战性的任务。现有作品在本地特征聚合器或全局几何架构上引入精致的设计，但很少结合两个优点。我们提出了与高频融合（DSPoint）的双模点云识别，通过同时在体素和点上运行来提取本地全局功能。我们扭转了常规设计对体素和注意点的应用卷积。具体而言，我们通过通道尺寸解开点特征，用于双尺度处理：一个逐个明智的卷积，用于细粒度的几何解析，另一个由Voxel-Wise全球关注远程结构探索。我们设计了一个共同关注的融合模块，用于混合本地 - 全局模态，通过传送高频坐标信息来进行尺度间跨模型交互。广泛采用的ModelNet40，ShapEnet和S3DIS上的实验和消融展示了我们的DSPoint的最先进的性能。

translated by 谷歌翻译

PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment

Qingyi Pan , Ning Guo , Letu Qingge , Jingyi Zhang , Pei Yang

分类：计算机视觉

2023-01-03

Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.

translated by 谷歌翻译

Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits

Ruibo Liu , Chenyan Jia , Ge Zhang , Ziyu Zhuang , Tony X Liu , Soroush Vosoughi

分类：自然语言处理 | 人工智能

2023-01-01

We present Second Thought, a new learning paradigm that enables language models (LMs) to re-align with human values. By modeling the chain-of-edits between value-unaligned and value-aligned text, with LM fine-tuning and additional refinement through reinforcement learning, Second Thought not only achieves superior performance in three value alignment benchmark datasets but also shows strong human-value transfer learning ability in few-shot scenarios. The generated editing steps also offer better interpretability and ease for interactive error correction. Extensive human evaluations further confirm its effectiveness.

translated by 谷歌翻译

Theoretical Characterization of How Neural Network Pruning Affects its Generalization

Hongru Yang , Yingbin Liang , Xiaojie Guo , Lingfei Wu , Zhangyang Wang

分类：机器学习

2023-01-01

It has been observed in practice that applying pruning-at-initialization methods to neural networks and training the sparsified networks can not only retain the testing performance of the original dense models, but also sometimes even slightly boost the generalization performance. Theoretical understanding for such experimental observations are yet to be developed. This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization. Specifically, this work considers a classification task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization. It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance. More surprisingly, the generalization bound gets better as the pruning fraction gets larger. To complement this positive result, this work further shows a negative result: there exists a large pruning fraction such that while gradient descent is still able to drive the training loss toward zero (by memorizing noise), the generalization performance is no better than random guessing. This further suggests that pruning can change the feature learning process, which leads to the performance drop of the pruned neural network. Up to our knowledge, this is the \textbf{first} generalization result for pruned neural networks, suggesting that pruning can improve the neural network's generalization.

translated by 谷歌翻译

Sharper analysis of sparsely activated wide neural networks with trainable biases

Hongru Yang , Ziyu Jiang , Ruizhe Zhang , Zhangyang Wang , Yingbin Liang

分类：机器学习

2023-01-01

This work studies training one-hidden-layer overparameterized ReLU networks via gradient descent in the neural tangent kernel (NTK) regime, where, differently from the previous works, the networks' biases are trainable and are initialized to some constant rather than zero. The first set of results of this work characterize the convergence of the network's gradient descent dynamics. Surprisingly, it is shown that the network after sparsification can achieve as fast convergence as the original network. The contribution over previous work is that not only the bias is allowed to be updated by gradient descent under our setting but also a finer analysis is given such that the required width to ensure the network's closeness to its NTK is improved. Secondly, the networks' generalization bound after training is provided. A width-sparsity dependence is presented which yields sparsity-dependent localized Rademacher complexity and a generalization bound matching previous analysis (up to logarithmic factors). As a by-product, if the bias initialization is chosen to be zero, the width requirement improves the previous bound for the shallow networks' generalization. Lastly, since the generalization bound has dependence on the smallest eigenvalue of the limiting NTK and the bounds from previous works yield vacuous generalization, this work further studies the least eigenvalue of the limiting NTK. Surprisingly, while it is not shown that trainable biases are necessary, trainable bias helps to identify a nice data-dependent region where a much finer analysis of the NTK's smallest eigenvalue can be conducted, which leads to a much sharper lower bound than the previously known worst-case bound and, consequently, a non-vacuous generalization bound.

translated by 谷歌翻译

Label-Efficient Interactive Time-Series Anomaly Detection

Hong Guo , Yujing Wang , Jieyu Zhang , Zhengjie Lin , Yunhai Tong , Lei Yang , Luoxing Xiong , Congrui Huang

分类：机器学习 | 人工智能

2022-12-30

Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.

translated by 谷歌翻译